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Abstract. In the last decades the estimation ol the intrinsic dimen- 
sionality of a dataset has gained considerable importance. Despite the 
great deal of research work devoted to this task, most of the proposed 
solutions prove to be unreliable when the intrinsic dimensionality of the 
input dataset is high and the manifold where the points lie is nonlin- 
early embedded in a higher dimensional space. In this paper we pro- 
pose a novel robust intrinsic dimensionality estimator that exploits the 
twofold complementary information conveyed both by the normalized 
nearest neighbor distances and by the angles computed on couples of 
neighboring points, providing also closed-forms for the Kullback-Leibler 
divergences of the respective distributions. Experiments performed on 
both synthetic and real datasets highlight the robustness and the effec- 
tiveness of the proposed algorithm when compared to state of the art 
methodologies. 

Keywords: Intrinsic dimensionality estimation, manifold learning, von 
Mises distribution, Kullback-Leibler divergence. 



1 Introduction 

Given a dataset = {xi]f =1 C #t D , its intrinsic dimension (id) is the min- 
imum number of parameters needed to represent the data without information 
loss. In the last decade a great deal of research work has been devoted to the 
development of id estimation algorithms; to this aim, the feature vectors ccj 
are generally viewed as points constrained to lie on a low dimensional manifold 

C 5R d embedded in a higher dimensional space !R , where d is the id to be 
estimated. In more general terms, according to [T5], X]y is said to have id equal 
to d £ {1..D} if its elements lie entirely within a d-dimensional subspace of $t D . 

The id is a very useful information for the following reasons. At first, di- 
mensionality reduction techniques, which are often used to reduce the "curse of 
dimensionality" effect [21| by computing a more compact representation of the 
data, are profitable when the number of projection dimensions is the minimal 
one that allows to retain the maximum amount of useful information expressed 
by the data. Furthermore, when using an auto-associative neural network [23] to 
perform a nonlinear feature extraction, the id can suggest a reasonable value for 
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the number of hidden neurons. Moreover, according to the statistical learning 
theory [38] , the capacity and the generalization capability of a classifier may de- 
pend on the id. In particular, in [14j the authors mark that, in order to balance 
a classifier's generalization ability and its empirical error, the complexity of the 
classification model should also be related to the id of the available dataset. 
Finally, as it has been recently shown in [4], id estimation methods are used to 
evaluate the model order in a time series, which is crucial to make reliable time 
series predictions; this consideration is supported by the fact that the domain of 
attraction of a nonlinear dynamic system has a very complex geometric structure 
and the studies on the geometry of the attraction domain are closely related to 
fractal geometry, and therefore to fractal dimension. 

Unfortunately, even if a great deal of research work has been focused at the 
development of id estimators, and several interesting techniques have been pre- 
sented in the literature, to our knowledge only few methods 5 28 34 33 have 
investigated the problem of input datasets having a sufficiently high id (that 
is id ^ 10) and being drawn from manifolds nonlinearly embedded in higher 
dimensional spaces; this fact is also highlighted by the experiments reported in 
this paper showing that well-known techniques fail when dealing with this kind 
of data. More precisely, it can be noted that several methods underestimate the 
id if its value is too high. These considerations lead us to the development of 
an id estimator, called "DANCo" (Dimensionality from Angle and Norm Con- 
centration), that is less affected by underestimation problems, as it is shown by 
experiments on both synthetic and real datasets, and by the comparison of the 
achieved results with those reported by state of the art algorithms. The pecu- 
liarities and strengths of the proposed estimator are to be sought in the joint 
use of normalized nearest neighbor distances and mutual angles, whose coupled 
exploitation allows to reduce the effects of well-known problems such as curse of 
dimensionality, edge effect, and overall orthogonality. 

This paper is organized as follows: in Section [2] previous works on id esti- 
mators are reviewed. In Section [3] base theoretical results laying foundations for 
the proposed estimator are presented. Section |4] sketches the proposed algorithm 
providing a concise analysis of its properties. A detailed comparison with state 
of the art methodologies on a wide family of datasets is summarized in Section O 
Finally, Section [6] reports conclusions and future works. 

2 Related Works 

In this section we summarize the literature related to id estimation methods; 
note that a more detailed description is reported in the survey [3]. 

The most cited id estimator is the Principal Component Analysis (PCA) [22] . 
which projects the input dataset on the d directions of maximum variance (prin- 
cipal components, PCs). Exploiting PCA, the estimated id is the number of PCs 
whose corresponding normalized eigenvalues are higher than a thresholding pa- 
rameter, usually difficult to be set. More accurate results can be obtained by 
applying a local PCA p~5] that determines the id by combining local estimates 
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computed in small subregions of the dataset; unfortunately, complications arise 
in the identification of local regions and in the selection of thresholds [39] . In [1] 
Bishop describes a Bayesian treatment of PCA (BPCA) to automatically estimate 
the id of the input dataset. This technique has been extended in |27] to cope 
with exponential family distributions, but this method requires the knowledge 
of the distribution underlying the data. To achieve an automatic selection of 
meaningful PCs, in [18] the authors propose the Sparse Probabilistic Principal 
Component Analysis (SPPCA) that exploits the sparsity of the projection matrix 
through a probabilistic Bayesian formulation. PCA-based methods, such as those 
previously mentioned, are usually classified as projection methods [3126] since 
they search for the best subspace where to project the data; unfortunately, they 
cannot provide reliable id estimates since they are too sensitive to noise and 
parameter settings |26J. 

Geometric id estimation methods [55] are most often based on some statistics 
related to either the distances between neighboring points or the fractal dimen- 
sion, expressing them as functions of the id of the embedded manifold. The 
most popular fractal dimension estimator is the Correlation Dimension (CD) [17j 
that is based on the assumption that the volume of a d-dimensional set scales 
as r d with its size r. Since the performances of CD are affected by the choice of 
the scale r, in [TH] the author suggests an estimator (here called He in) based 
on the asymptotes of a smoothed version of the CD estimate. In [11] the authors 
present an algorithm to estimate the id of a manifold in a small neighborhood 
of a selected point, and they analyze its finite-sample convergence properties. 
Another technique, based on the analysis of point neighborhoods, is the Max- 
imum Likelihood Estimator (MLE) 26J that applies the principle of maximum 
likelihood to the distances between neighboring points. In [8] the authors pro- 
pose an algorithm that exploits entropic graphs to estimate both the id and 
the intrinsic entropy of a manifold; they test their method by adopting either 
the Geodesic Minimal Spanning Tree (GMST [7]), where the arc weights are the 
geodetic distances computed through the ISOMAP algorithm [35], or the more 
efficient kNN-graph (kNNG [5]), where the arc weights are based on Euclidean 
distances. 

We note that many neighborhood based estimators usually underestimate 
the id when its value is sufficiently high and, to our knowledge, only few works 
address this problem [5|34|28j . Indeed, as shown in [10], the number of sample 
points required to perform dimensionality estimation grows exponentially with 
the id ("curse of dimensionality"). For this reason, when the dimensionality is 
too high the number of sample points practically available is insufficient to com- 
pute an acceptable id estimation. Moreover, the ratio between the points close 
to the edge of the manifold and the points inside it raises in probability when 
the dimensionality increases ("edge effect", [39 ), affecting the results achieved 
by estimators based on statistics related to the behavior of point neighborhoods. 

In [5] , the authors propose an empirical id correction procedure based on the 
estimation of the error obtained on synthetically produced datasets of known 
dimensionality. More precisely, after generating D datasets characterized by in- 
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cremental id values (d» £ {1..D}), the authors apply the CD algorithm [17] to 
estimate the id (di) of each dataset. Fitting the points (di, di) they obtain the so- 
called "correction curve" used to adjust the id estimates. In [34] a local estimator 
(called IDEA) based on an asymptotic correction is proposed. To this aim, given 
a dataset of unknown id, random subsets of different cardinalities are extracted 
and their id estimates are computed; the bi-dimensional points composed by 
the cardinality of each subset and by its id estimate are then fitted with a curve 
having a horizontal asymptote whose ordinate is the final id. In |28j the authors 
describe a method (called MiND KL ) based on the comparison between the em- 
pirical probability density function of the neighborhood distances computed on 
the dataset and the distribution of the neighborhood distances computed from 
points uniformly drawn from hyperspheres of known increasing dimensionality; 
the id estimate is the one minimizing the Kullback-Leibler divergence (KL). 



3 Theoretical Results 

Consider a manifold At = embedded in a higher dimensional space $l D 
through a locally isometric nonlinear smooth map tj> : — > $l D ; to estimate 
the id of At by means of points drawn from the embedded manifold through a 
smooth probability density function (pdf ) /, we need to identify a "mathematical 
object" depending only on d, and we should define a consistent estimator for d 
based on it. 

Assume by hypothesis that the employed manifold sampling process is driven 
by a smooth pdf /; moreover, consider a spherical neighborhood of the origin 
Od having radius e; denoting with X8 d (o d ,i) the indicator function on the unit 
ball Bd(Od, 1), the pdf restricted to such a neighborhood is: 

f 1 ~\ _ /( eZ )XB d (O d ,l)(z) m 

/,() ~W^ /(«*)* () 

In [28] the authors prove the following: 

Theorem 1. Given {e^} — ¥ + , Equation (Q]) describes a sequence of pdf 's hav- 
ing the unit d- dimensional ball as support; such sequence converges uniformly to 
the uniform distribution in the ball 3d(0d, 1). 

Theorem [T] ensures that, from a theoretical standpoint, in our setting it is 
possible to assume uniformly distributed points in every neighborhood of At; 
in other words, it is possible to define consistent estimators based on local in- 
formation, assuming without loss of generality that the normalized points are 
uniformly drawn from the unit hypersphere. 

Our technique exploits the statistical properties of norms and mutual angles 
computed on points drawn from uniformly sampled hyperspheres; to this aim, 
in Sections 13.11 13.21 we sketch the statistical properties of norms and angles 
respectively, while in Section [3731 we describe how both the above properties can 
be simultaneously used to define a consistent estimator of the manifold's id. 
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3.1 Concentration of Norms 



Consider initially the problem of estimating the id of A4 by means of a sample 
{ z i}i=i °f points uniformly drawn from 6^(0^,1); to this aim, we exploit the 
concentration of norms that is dimensionality-dependent. 

In [28] it is shown that the pdf associated to the normalized distance r 
between the hypersphere center and its nearest neighbor is the following: 

g(r;k,d) = kdr d - 1 (l-r d ) k - 1 (2) 

Theorem [T] proves that the convergence of f e to is uniform, so that when 
e — >• + the pdf related to the geodetic distances -5$ (<fi(0d), 4>{ z )) converges to 
the pdf g defined in Equation @. Notice that, once k is fixed, Q = {g(r; k, d)}® =1 
represents a finite family of D pdf s for all the parameter values 1 < d < D. 

As reported in [28], a Maximum Likelihood estimator (ML) could be found 
for the parameter d of g, but the resulting estimate may be poor due to the 
usage of the kNN algorithm. More precisely, in high dimensional spaces, the kNN 
method is strongly affected by the edge effect [3J5] that reduces the quality of 
the neighborhood estimation. 

To obtain a more reliable estimate of d, in [28] the authors propose to min- 
imize the KL divergence between the pdf computed on the dataset and those 
calculated on synthetic data of known ids; to this aim, they adopted the KL 
method proposed in |41) . 

However, though this KL approach can be applied to every dataset without 
any restriction on the underlying distribution, in our problem the closed-form 
for the KL divergence between two minimum neighbor distance pdfs can be 
analytically identified. To this aim, once the parameter k is fixed, we need to 
estimate the parameter d in g; to accomplish this task, we decided to employ 
the ML estimator proposed in [25]. Calling dyiL the ML estimation obtained on 
the dataset, and dd,ML the ML estimations obtained by means of points sampled 
from d-dimensional hypersphere^] (for d € {I..D}), the closed-form of the KL for 
the minimum neighbor distances is: 



g(r;k,d M L) 



KL d = KC(g(-;k,dML),g(-;k,dd,ML)) = / g(r;k, dMi)log — - — '— '- dr 

Jo \g(r; k, dd,ML) J 

= Uk h^ i - « fc _! - log - ( k - 1) j2(-y f fc V f 1 + T^) ( 3 ) 

dmh dML i=0 \ l J \ dd,ML J 

where K,C{-, •) is the KL divergence operator, Hk represents the k-th harmonic 
number (lik = J2i=i i) > an d ^(') is the digamma function. 



1 Notice that, due to the kNN bias effect described above, the ML estimates dd,ML are 
biased w.r.t. the real value d employed in the sampling process, and a similar bias 
can be observed also in the estimated dML- 
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3.2 Concentration of Angles 

As it happens for norms, in high dimensions pairwise angles among k uniformly 
distributed unitary vectors {xi}f =1 on a (d — l)-dimensional surface 5 d_1 of a 
hypersphere in 5R d are subject to the concentration of their values. The common 
belief that in high dimensions such vectors tend to be orthogonal to each other 
has found partly theoretical justification in the past [30], but only in the last 
decades an even deeper investigation has allowed a more precise characterization 
of this fact [35] . 

Two of the most common distributions adopted in circular and directional 
statistics are the von Mises distribution (VM) and its high-dimensional generaliza- 
tion termed von Mises- Fisher distribution (VMF). More precisely, for x € S x , 
the VMF distribution has the following form: 



where the unit vector u denotes the mean direction, and the concentration pa- 
rameter t > gets high values in case of a high concentration of the distribution 
around the mean direction. In particular, r = when points are uniformly dis- 
tributed on S d ~ 1 . Moreover, the normalization constant Cd(r) in Equation ([4]) 
takes the following form: 



where I v is the modified Bcssel function of the first kind with order v. Due to the 
normalization factor, this pdf is difficult to be used in theoretical derivations; 
moreover, in the assumptions of Theorem [1] no information about d may be 
estimated by the knowledge of parameters (y,r), being v uninformative when 
the hyphersolid angles are uniformly distributed (r = 0), which is the case of a 
uniformly sampled hypersphere. 

Therefore, to infer the id of A4 by exploiting the angular information, we 
focused on the distribution of the angles 6 computed between independent pairs 
of random points chosen in neighborhoods of $l d and sampled from the uniform 
distribution in the hypersphere. Note that working on pairwise angles allows 
both to exploit the concentration factor r, which is strictly related to the di- 
mensionality d as we will show, and to rely on the VM distribution, which is more 
tractable w.r.t. the VMF pdf. 

With the above notation, considering the angle 8 E [— tt, it] between two 
vectors, the VM pdf of reads as: 




(4) 



C d (r) 



T d/2-l 



(2 7 r)^7 d/2 _ 1 (r) 



,T COS(0 — is) 



g(6»;z/, t) 



2ttI (t) 



(5) 



with the same parameters and notation adopted for the VMF pdf. Intuitively, 
the VM distribution is the circular counterpart of the normal distribution on a 
line, sharing with the latter many interesting properties [2]. To understand the 
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link between r and d, we firstly recall that q(9; r) is unimodal for r > 0, 
as a Gaussian random variable peaked around its mean. Next, according to the 
following theorem, increasing values of r are expected for points uniformly drawn 
from hyperspheres with increasing dimensionality d. 

Theorem 2. Given two independent random unit vectors (xi, X2) in 3? , chosen 
from a uniform distribution on S^ -1 , the concentration parameter r of the VM 
distribution describing the angle 9 between x± and X2 converges asymptotically 
to the dimensionality d. 

Proof. Consider the following results: 

i) for large concentration values t, a VM distribution with parameters (y, r) be- 
comes a Gaussian distribution with mean v and standard deviation l / *Jr [20]; 
ii) performing the variable substitution 9 — ^fd{9 — "'/ 2 )j ^ e resulting random 
variable converges in distribution to a standard normal one \35f . 



Combining i) and ii), it follows that 9 asymptotically follows a Gaussian pdf 
with mean v = f/2 and standard deviation a = 1 /s/t = l /Vd, which holds only 
when t = d. 

Theorem [2] has both a general and a specific value. At first, it formally proves 
the existence of the concentration of angles in high dimensions, stating both 
an asymptotic linear relation between concentration and dimensionality, and 
the orthogonality between any couple of infinite-dimensional vectors. Moreover, 
Theorem [5] allows to estimate the id (d) of the observed points through the 
estimation of the concentration parameter r. 

The methodology we propose in Section |4] employs both the ML estimation of 
the VM parameters v and r, and the KL divergence between the VM pdf estimated 
from the observed dataset and those computed on synthetic data of known ids. 
Assuming that {#1, . . . , 9n} is a sample drawn from a VM distribution with pa- 
rameters (2/, t), the ML of the population direction v equals the sample mean 
direction; more precisely: 

v = arctan — ^ (6) 

Likewise, the ML of the concentration parameter r equals the concentration 
parameter f calculated as a solution of r; = j^pj = A(t), where 77 is the norm 
of the sample mean vector defined in |37j as: 



mi^E cos ^J +(ifX> iii(? ( 7 ) 



1=1 



Being A a non invertible function, we rely on the well-known and qualified 
method proposed in |12) . which approximates A -1 (77) by: 

'2g + g 3 + ^f J7<0.53 
^ 1 ( J ?) = { -0.4+ 1.39?/ + 0.53 < 7/ < 0.85 (8) 

77 > 0.85 
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Once an estimate of the VM pdf is obtained, we need to compare it with those 
computed on synthetic data of known ids. To this aim, a closed-form of the KL 
between two VM pdfs of parameters {vx,t\) and (^2,^2) is defined in [3D] as: 

KL v , T =KC{q{-,v U T 1 ),q{-,v 2 ,T 2 )) = f q(9; v u n) \ og (S^llllL \ d6 

= lo ^TT^ + or ( \ (n - r 2 cos(i/ 2 - fc'i)) (9) 

loin) 2io(nj 



3.3 Combining Angle and Norm Concentration 

In the previous sections we described the base theory laying foundations for an 
id estimator exploiting the information conveyed by the concentration of norms 
and angles. To provide a unique technique that combines these information, we 
should compare the joint empirical pdf h(r, 9) related to the real dataset with 
the D theoretical pdfs, which will be referred to as hd(r, 9) (where d <E {1..D}). 
Summarizing, the id estimate we want to compute is: 

d = argmin f f hd(r, 6) log I ^f^ 1 "' ®' | drdfl 
i<d<B J-i Jo \h{r,9) J 

Note that neither the theoretical ha is easily derivable, nor the joint pdf h can 
be precisely estimated. Luckily, the norm distribution g(r;k,d) and the angle 
distribution q{9] v, r) are independent when the data are uniformly drawn from 
a spherical distribution |29j : therefore the joint pdf factorizes in the product of 
the two marginals, i.e. hd(r,6) = g(r;k,d)q(9;v,T), so that the KL divergence 
between hd(r,6) and h(r,6) becomes: 

KL d , u , T = KC(hd{r, 9),h{r, 9)) = KL d + KL v , r (10) 

This fact allows to split the joint KLd, v , T in the sum of the two closed- 
form divergences reported in Equations (O [9]) ; it follows that the id estimator 
exploited in our algorithm becomes: d = argmin 1<d<£) KLd^, T - 



4 The Algorithm 

In this section we show how the theoretical results presented in Section [3] can 
be exploited to estimate the id of a given dataset combining the information 
expressed by the angles and by the minimum neighbor distances. 

More precisely, we consider a manifold = di d embedded in a higher dimen- 
sional space $l D through a locally isometric nonlinear smooth map 4> '■ M. — > §t D , 
and a sample set Xn = {xi}^ = {4>( z i)}iLi C $t D , where Zi are independent 
identically distributed points drawn from J\A according to a non-uniform smooth 
pdf / : M -> 5R+. 
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To estimate the id of M., for each point £ Ijy we find the set of k + 1 
(1 < fc < iV — 2) nearest neighbors X^+i = -Xfc+i(a;j) = {xj}j~t\ C -Xjy- Calling 
a; = Xk+i(xi) 6 -X~fc + i the farthest neighbor of a^, we calculate the distance 
between and its nearest neighbor in X^+i, and we normalize it by means of 
the distance between Xi and x. More precisely: 

„(*,)= min \°2 ~ 11 (11) 

This equation is used to compute a vector of normalized distances r — {fi}^ =1 = 
{p(xi)}fL 1 . By employing Equation (7) in [28], we compute the ML estimation 
by numerically solving the optimization problem cIml — ar g max i<d<.D 

11(d), 

where: 

11(d) = N\ogkd + (d- 1) logp(»i) + (&-l) X] log (l - 

Similarly, for each point Sj € Xjy we find its k nearest neighbors Xk and we 
center them by means of a translation to obtain Xk = {xj — X{ : VcCj 6 
next, we calculate ( 2 ) angles of all the possible pairs of vectors in Xk, as follows: 

X ' x ■ 

6>(a: z , Xj) = arccos z J (12) 

For each neighborhood we compute a vector = {9(x z , £Ej)}i<i<j<fc by means 
of Equation (| 12|) . Since follows a VM pdf of parameters v and r (see Sec- 
tion I3.2[) , we estimate their values by employing the ML approach described in 
Equations (jHE]) for each set of neighbors, thus obtaining the vectors — {£j}£Li 
and t = {i~i} 1 jL 1 , and their means fi v — N^ 1 &i ancl At = iV _1 YaLi 

Moreover, for each dimensionality d G {1..D} we uniformly draw a set of 
N points Yisid — {Ui}iLi from the unit d-dimensional hypersphere@, and we 
similarly compute a vector of normalized distances fd = {fid}iLi = {p(l/i)}iLi 
and its ML estimation dd,ML- Next, we calculate the vectors of the VM distribution 
parameters i>d — {^il^Li an d T~d = { T i\iL\ together with their means jl", and jlf. 

Finally, we compose Equations ([HI [9]) as reported in Equation ([T0|) . thus 
obtaining the following id estimate: 

d= argmin KC(g(-; fc, d ML ), g(-\ k,d dtML )) + KC(q(-\ (i v , p, T ),q(-; (it, Ar)) (13) 
de{i..D} 

We call this id estimator DANCo (Dimensionality from Angle and Norm Concen- 
tration). Its time complexity is 0(D 2 N log N) and it is dominated by the time 
complexity of the kNN algorithm (O(DNlogN)). 

Considering Theorem 4 in [S], which ensures that geodetic distances in the 
infinitesimal ball converge to Euclidean distances with probability 1, and the 
results in Theorems [TJ [2[ Equation (TT5|) represents a consistent estimator for the 
id of the manifold A4. 

2 Notice that a d-dimensional vector randomly sampled from a d-dimensional hyper- 
sphere according to the uniform pdf can be generated by drawing a point y from a 
standard normal distribution M (-\0d, 1) and by scaling its norm. 
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5 Algorithm Evaluation 

In this section we describe the datasets employed in our experiments (see Sec- 
tion l5.1[) . we summarize the adopted experimental settings (see Section I5T2"]) . and 
we report the results achieved by the proposed algorithm, comparing them to 
those obtained by state of the art id estimators (see Section l5~3|) . 

5.1 Dataset Description 

To evaluate our algorithm, we have performed experiments on the 17 synthetic 
and 5 real datasets reported in Table [TJ In details, to generate 15 synthetic 
datasets we have employed the tool proposed in [TJ5], extending it to produce 
the datasets and by drawing points from nonlinearly embedded man- 
ifolds having high id. More precisely, to generate AI13 we have proceeded as 
follows: starting from 2500 points {a^} 2 !™ uniformly drawn in [0, l] 18 , we multi- 
plied each Xi first by sin(cos(27ra;i)), then by cos(sin(27ra;i)), obtaining points in 
[0, l] 36 after a concatenation of the above coordinates. The dataset AI13, con- 
taining 2500 points in [0, l] 72 , was finally obtained by duplicating each point's 
coordinate; this dataset, whose id is 18, is composed by points drawn from a 
manifold nonlinearly embedded in -ft 72 . The dataset A / ti4 was similarly gener- 
ated starting from the same number of uniformly sampled points in -ft 24 . 

The real datasets employed are: the ISDMAP face database [36], the MNIST 
database [25], the Santa Fe [32] dataset, the Isolet dataset [13], and the DSVC1 
time series [I]. 

The ISDMAP face database consists in 698 gray-level images of size 64 x 64 
depicting the face of a sculpture. This dataset has three degrees of freedom: two 
for the pose and one for the lighting direction. 

The MNIST database consists in 70000 gray-level images of size 28 x 28 of 
hand-written digits; in our tests we used the 6742 training points representing 
the digit 1. The id of this database is not actually known; wc therefore rely on 
the estimations proposed in |19I9] for the different digits, and in particular on 
the range {8.. 11} for the digit 1. 

The version D2 of the Santa Fe dataset is a synthetic time series of 50000 
one-dimensional points; it was generated by a simulation of particle motion, and 
it has nine degrees of freedom. In order to estimate the attractor dimension of this 
time series, we used the method of delays described in [31], which generates D- 
dimensional vectors by collecting D values from the original dataset; by choosing 
D = 50 we obtained a dataset containing 1000 points in 3? 50 . 

The Isolet dataset has been generated as follows: 150 subjects spoke the 
name of each letter of the alphabet twice, thus producing 52 training examples 
from each speaker. The latter are grouped into sets of 30 speakers each, and are 
referred to as isoletl, isolet2, isolet3, isoletA, and isolet5, for a total of 7797 
samples. The id of this dataset is not actually known, but a study reported 
in [24] has proposed that the correct estimation could be in the range {16. .22}. 
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Table 1. Brief description of the 17 synthetic and 5 real datasets employed in our 
experiments, where d is the id and D is the embedding space dimension. 



Dataset 


Name 


d 


D 


Description 




Aii 


10 


11 


Uniformly sampled sphere linearly embedded. 




M.2 


3 


5 


Affine space. 




Al 3 


4 


6 


Concentrated figure, confusable with a 3d one. 




Al 4 


4 


8 


Nonlinear manifold. 




Ms 


2 


3 


2-d Helix 




Me 


6 


36 


Nonlinear manifold. 




At 7 


2 


3 


Swiss-Roll. 




At 8 


20 


20 


Affine space. 


Syntethic 


Al 9 a 


10 


11 


Uniformly sampled hypercube. 




Mm 


17 


18 


Uniformly sampled hypercube. 




M.9c 


24 


25 


Uniformly sampled hypercube. 




Mgd 


70 


71 


Uniformly sampled hypercube. 




Alio 


2 


3 


Moebius band 10-times twisted. 




Mu 


20 


20 


Isotropic multivariate Gaussian. 




Ml2 


1 


13 


Curve. 




Ml3 


18 


72 


Nonlinear manifold. 




Mu 


24 


96 


Nonlinear manifold. 




Atpaces 


3 


4096 


IS0MAP face dataset. 




AImnisti 


8-11 


784 


MNIST database (digit 1). 


Real 


AtsantaFe 


9 


50 


Santa Fe dataset (version D2). 




Allsolet 


16 - 22 


617 


Spoken letter of the alphabet 




AIdsvci 


2.26 


20 


Real time series of a Chua's circuit. 



The DSVC1 is a real data time series composed of 5000 samples and measured 
from a hardware realization of the Chua's circuit [6j. We used the method of 
delays choosing D = 20, and we obtained a dataset containing 250 points in 
JJ 20 ; the id of this dataset is ~ 2.26 as reported in [4]. 

5.2 Experimental Setting 

To objectively assess our method, we compared it with well-known id estimators 
such as: SPPCA, kNNG, CD, MLE, Hein, BPCA, MiND KL , and IDEA. For kNNG, MLE, 
Hein, BPCA, MiNDa, and IDEA we used the authors' implementation^, while for 
the other algorithms wc employed the version provided by the dimensionality 
reduction toolbojfl. 

To generate the synthetic datasets we adopted the modified generator de- 
scribed in Section 15.11 creating 20 instances of each dataset reported in Table [U 
each of which is composed by 2500 randomly sampled points. 

http: //www. cccs.umich.edu/~hcro/IntrinsicDim/, 
http: / / www.stat.lsa.umich.edu/~clevina/mlcdim.rn. 
http:/ /www. ml. uni-saarland.de/codc.shtml, 

http: / / research.microsoft.com/en-us/um/cambridgc/projects/infernet /blogs/baycsianpca.aspx 
http: / /security.dico. unimi.it / ~fox721/ 
^ http: / / cseweb.ucsd.edu/~lvdmaaten/dr/download.php 
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Table 2. Parameter settings for the different estimators: fc represents the number of 
neighbors, 7 is the edge weighting factor for kNNG, M is the number of Least Square 
(LS) runs, N is the number of resampling trials per LS iteration, a and -k represent the 
parameters (shape and rate) of the Gamma prior distributions describing the hyper- 
parameters and the observation noise model of BPCA, fj, contains the mean and the 
precision of the Gaussian prior distribution describing the bias inserted in the inference 
of BPCA. 



Dataset 


Method 


Parameters 


Synthetic 


SPPCA 
CD 

MLE 
kNNGi 
kNNG 2 

BPCA 
MiND KL 

IDEA 
DANCo 


None 
None 
ki = 6 k 2 = 20 
fci = 6, k 2 = 20, 7 = 1,M = 1, N = 10 
fci = 6, fe 2 = 20, 7 = 1, M = 10, N = 1 
iters = 500, a = (2.0, 2.0) tt = (2.0, 2.0) fi = (0.0, 0.01) 
k = 10 

fc = 10 
fe = 10 


Real 


SPPCA 
CD 

MLE 
kNNGi 
kNNG 2 

BPCA 
MiND KL 

IDEA 
DANCo 


None 
None 
ki = 3 k 2 = 8 
fci = 3, k 2 = 8,7 = 1, M = 1, N = 10 
ki = 3, k 2 = 8, 7 = 1, M = 10, N = 1 
iters = 2000, a = (2.0, 2.0) tt = (2.0, 2.0) = (0.0, 0.01) 
k = 5 
fe = 5 
fe = 5 



To obtain an unbiased estimation, for each technique we averaged the results 
achieved on the 20 instances. To execute multiple tests also on A^mmisti and 
A^i so iBt we extracted 5 random subsets containing 2500 points each, and we 
averaged the achieved results. 

In Table [2] the configuration parameters employed in our tests are summa- 
rized. To relax the dependency of the kNNG algorithm from the selection of the 
value of its parameter k, we performed multiple runs with k\ < k < ki and we 
averaged the achieved results (see Table [2| . 

5.3 Experimental Results 

This section reports the results achieved on both synthetic and real datasets. In 
particular, Table [3] summarizes the results obtained on the synthetic datasets. 
It is possible to note that the best performing algorithm is DANCo. Indeed, this 
estimator can correctly deal with linear and nonlinear manifolds embedded in low 
and high dimensional spaces. In particular, it is the only method that achieves 
a good estimation for the three datasets A4gd, A4i3, and M.\±. 

Instead, geometrical approaches, such as kNNG, CD, MLE, and Hein, obtain 
good estimates only for low id manifolds, failing to deal with high id data. 
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Table 3. Results achieved on the synthetic datasets. The best approximations are 
highlighted in boldface. 



Dataset 


d 


SPPCA 


BPCA 


kNNGi 


kNNG 2 


CD 


MLE 


Hein 


MiND KL 


IDEA 


DANCo 




1 


3.00 


5.70 


0.97 


1.07 


1.14 


1.00 


1.00 


1.00 


1.02 


1.00 


Ais 


2 


3.00 


2.00 


1.96 


2.06 


1.98 


1.97 


2.00 


2.00 


2.00 


2.00 


Ml 


2 


3.00 


2.00 


1.97 


2.09 


1.93 


1.96 


2.00 


2.00 


2.07 


2.00 


Alio 


2 


3.00 


1.55 


1.95 


2.03 


2.19 


2.21 


2.00 


2.00 


1.98 


2.00 


M.2 


3 


3.00 


3.00 


2.95 


3.03 


2.88 


2.88 


3.00 


3.00 


3.03 


3.00 


M 3 


4 


4.00 


4.00 


3.75 


3.82 


3.23 


3.83 


4.00 


4.00 


4.01 


4.00 


At 4 


4 


8.00 


4.25 


4.05 


4.76 


3.88 


3.95 


4.00 


4.15 


3.93 


4.00 


M 6 


6 


12.00 


12.00 


6.46 


11.24 


5.91 


6.39 


5.95 


6.50 


6.33 


6.90 


Mi 


10 


11.00 


5.45 


9.16 


9.89 


9.12 


9.10 


9.45 


10.30 


10.41 


10.00 


Mg a 


10 


10.00 


5.20 


8.62 


10.21 


8.09 


8.26 


8.90 


9.85 


9.93 


9.50 


Mgb 


17 


17.00 


9.46 


13.69 


15.38 


12.30 


12.87 


13.85 


16.25 


16.07 


16.47 


Ml3 


18 


36.00 


36.00 


17.58 


5.01 


11.60 


15.95 


14.00 


18.60 


17.30 


18.20 


Ms 


20 


20.00 


13.55 


15.25 


10.59 


13.75 


14.64 


15.50 


19.15 


18.51 


19.54 


M u 


20 


20.00 


13.70 


16.40 


24.89 


11.26 


15.82 


15.00 


19.35 


21.20 


19.90 


Mg c 


24 


24.00 


13.3 


17.67 


21.42 


15.58 


16.96 


17.95 


22.55 


23.93 


23.85 


Mu 


24 


48.00 


48.00 


19.66 


22.80 


14.03 


19.83 


17.00 


25.30 


22.90 


25.00 


Mg d 


70 


71.00 


71.00 


39.67 


40.31 


31.4 


36.49 


38.69 


65.30 


46.7 


70.42 


MPE 




44.79 


61.55 


11.72 


20.14 


20.79 


13.78 


12.04 


2.94 


4.75 


1.90 



Moreover, the projection techniques, such as BPCA and SPPCA, are able to cor- 
rectly deal only with linear embedded manifolds. These considerations confirm 
that the geometric methods are affected by an underestimation bias as noticed 
in |28I34| and that all the projection methods cannot provide reliable id esti- 
mates [26] . 

Furthermore, DANCo outperforms also IDEA and MiND K L that have been devel- 
oped to deal with datasets having a sufficiently high id (that is id ^ 10) and 
being drawn from manifolds nonlinearly embedded in higher dimensional spaces. 

In the last row of Table [3] the Mean Percentage Error (MPE) indicator, pro- 
posed in [28] in order to evaluate the overall performance of a given estimator, is 
reported. For each algorithm this value is computed as the mean of the percent- 
age errors obtained on each dataset, i.e. MPE = ^p2_ J2m. ^^ M ^ > where djvi is 

the real id, djvi is the estimated one, and #A4 is the number of tested manifolds. 
Considering this indicator, DANCo ranks as the best performing estimator. 

In Table [4] the results obtained on real datasets are summarized. Being the 
real data generally affected by the presence of noise, the quality of the estimates 
computed by the projection methods is strongly reduced, as confirmed by the 
poor results obtained by BPCA and SPPCA. The geometric approaches we tested 
are less affected by noise, but they are not able to correctly deal with the high 
dimensionality of the Aii so i e t dataset. 

As can be seen, DANCo is the best performing estimator, strongly overcoming 
also the results obtained by those techniques, such as IDEA and MiND K L, that 
exploit a correction approach. These results, together with the best average 
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Table 4. Results achieved on the real datasets by the employed approaches. The best 
approximations are highlighted in boldface. 



Dataset 


d 


SPPCA 


BPCA 


kNNGi 


kNNG 2 


CD 


MLE 


He in 


MiND KL 


IDEA 


DANCo 




2.26 


4.00 


6.00 


1.77 


1.86 


1.92 


2.03 


3.00 


2.50 


2.14 


2.26 


A'tFaces 


3 


5.00 


4.00 


3.60 


4.32 


3.37 


4.05 


3.00 


3.90 


3.73 


4.00 


Alsanta Fe 


9 


19.00 


18.00 


7.28 


7.43 


4.39 


7.16 


6.00 


7.60 


7.26 


8.19 


-A'Imnisti 


8-11 


9.00 


11.00 


10.37 


9.58 


6.96 


10.29 


8.00 


11.00 


11.06 


9.98 


A4iaolet 


16-22 


45.00 


19.00 


6.50 


8.32 


3.65 


15.78 


3.00 


20.00 


18.77 


19.00 


MPE 




79.37 


62.92 


27.14 


27.24 


37.22 


18.17 


33.21 


15.44 


13.32 


9.47 



estimation precision achieved by our technique in terms of MPE@, confirm that 
DANCo is a promising and valuable tool for id estimation. 

Finally, to test the robustness of our algorithms w.r.t. the choice of the pa- 
rameter fc, we employed DANCo to reproduce the experiments proposed for the 
MLE algorithm in Figure 1 (a) of [26 and in Figure 2 of [35], and we averaged the 
curves obtained in 10 runs. In these tests the adopted datasets are composed by 
points drawn from the standard Gaussian pdf in 5ft 5 . We repeated the test for 
datasets with cardinalities N € {200, 500, 1000, 2000} varying the parameter k in 
the range {5. .100}. For all the combinations of the dataset cardinalities and the 
k parameter values, DANCo obtained id estimates always equal to 5, confirming 
its strong robustness. 

6 Conclusions and Future Works 

In this paper we proposed a novel consistent estimator, called DANCo, that com- 
bines the effects of concentration of angles and norms to estimate the id of 
a given dataset. The proposed method compares the joint pdf estimated on 
the dataset, related to angles and norms respectively, with those computed on 
synthetic datasets of known id; to this aim, a closed-form expression for the 
Kullback-Leibler divergence of their distributions is employed. 

We tested our algorithm on both synthetic and real datasets comparing its 
results with those obtained by employing well-known id estimators. The overall 
results show that DANCo is a really promising and valuable technique for id esti- 
mation. Indeed, it provides the most accurate results, computing either the best 
id estimates or values that are strongly comparable to the best ones. Moreover, 
this algorithm has shown to be really robust in terms of its capability to: i) deal 
with both high and low id, ii) manage both linearly and nonlinearly embedded 
manifolds, and hi) outperform all the other estimators on noisy real datasets. 

Future works will be devoted to identify a bound for the finite sample error, 
to further formally evaluate the effectiveness of the proposed approach. 

5 Where the true value of the id is not known, we considered the mean value of the 
range as dw,. 
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